Text categorization using lexical chains

نویسنده

  • Tue Haste Andersen
چکیده

In this report I present a prototype system for use in dynamic text categorization research. The system implements lexical chaining, as described in recent literature. On top of this is built a simple extension to use for automatically identifying one or several categories to place a given text in. The initial tests presented in this report does not give any useful results, however, it give rise to new questions and possible directions for future research of lexical chaining and its uses in text categorization. Along with the implementation, previous research and the lexicographic database WordNet are discussed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Empirical Textual Mining to Protein Entities Recognition from PubMed Corpus

Wednesday, June 15th 8:00 Conference Registration (Registration desk) 8:45 Session 1: Large-Scale Online Linguistic Resources (I) Chair: "Text Categorization Based on Subtopic Clusters" Francis Chik, Robert Luk, Korris Chung "Automatic Filtering of Bilingual Corpora for Statistical Machine Translation" Shahram Khadivi, Hermann Ney "The Role of Word Sense Disambiguation in Automated Text Categor...

متن کامل

Interaction Chain Patterns of Online Text Construction with Lexical Cohesion

This study aims at arousing college students’ metacognition in detecting lexical cohesion during online text construction as WordNet served as a lexical resource. A total of 83 students were requested to construct texts through sequences of actions identified as interaction chains in this study. Interaction chains are grouped and categorized as a meaningful entity in order to investigate the st...

متن کامل

رویکردی با ناظر در استخراج واژگان کلیدی اسناد فارسی با استفاده از زنجیره‌های لغوی

Keywords are the main focal points of interest within a text, which intends to represent the principal concepts outlined in the document. Determining the keywords using traditional methods is a time consuming process and requires specialized knowledge of the subject. For the purposes of indexing the vast expanse of electronic documents, it is important to automate the keyword extraction task. S...

متن کامل

Using Genetic Algorithms with Lexical Chains for Automatic Text Summarization

Automatic text summarization takes an input text and extracts the most important content in the text. Determining the importance of information depends on several factors. In this paper, we combine two different approaches that have been used in the text summarization domain. The first one is using genetic algorithms to learn the patterns in the documents that lead to the summaries. The other o...

متن کامل

Minimal training based semantic categorization in a voice activated question answering (VAQA) system

In this paper, we develop a knowledge based methodology that maps Automatic Speech Recognizer (ASR) transcriptions to predefined semantic categories in a Voice Activated Question Answering (VAQA) system. The proposed semantic categorization methodology, SemCat, uses a novel lexical chains/ontology based algorithm and relies heavily on customized but domain independent Natural Language Processin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000